1. What's PMCSched?

PMCSched is an open-source OS-oriented framework for scheduling algorithms development. Its source code can be found on GitHub (link), introduced as part of PMCTrack public release v3.0.

PMCSched is a framework for the Linux kernel that enables rapid development of the OS-level support required to create custom scheduling and resource-management schemes on both symmetric and asymmetric multicore systems (AMPs). Unlike other existing frameworks that require patching the Linux kernel to function, PMCSched makes it possible to incorporate new scheduling-related OS-level support in Linux via a kernel module that can be loaded in unmodified kernels, making its adoption easier in production systems. Notably, the main focus of this framework is to simplify the creation of novel scheduling and resource-management strategies that are either implemented entirely in the OS kernel, or require changes in different layers of the system software, so as to benefit from coordinated decisions between the runtime system and the OS scheduler.

PMCSched was born as a continuation of PMCTrack, an OS-oriented performance monitoring tool for Linux. With the PMCSched framework we take PMCTrack’s potential one step further by enabling rapid development of OS support for scheduling and resource management for Linux within a loadable kernel module. The diagram below illustrates the addition of PMCSched into the PMCTrack design.


Addition of PMCSched into PMCTrack


2. Developing an example plugin

PMCSched is built on top of PMCTrack, and so the first step requires installing and building PMCTrack. Instructions for that can be found here. Those instructions are highly encourage to be read, since they include insights on the use of PMCTrack from user space, from the OS scheduler, and PMCTrack monitoring modules.


2.1. Defining the new plugin

Once PMCTrack is installed we can go ahead and start creating our custom scheduling algorithm. A new scheduling or resource management algorithm can be implemented by creating a scheduling plugin. We start by defining our new plugin on PMCSched's main header pmcsched.h. For example, with a new example plugin with ID SCHED_EXAMPLE (added into the enum of policies). Our new plugin has to be included into the array of available schedulers too, in that same header.

We add it to the enum of possible plugins: we define the plugin as extern, as we will use a separate file to define its functions: and finally we include it on the array of possible plugins:

after that, we can start developing our new plugin right away, in a new file example.c. Creating a plugin works similarly to Linux kernel modules, the plugin follows a "contract" and implements a number of functions that every plugin should have, each with specific function attributes, and intended to handle specific events.

The various algorithm-specific operations are invoked from the core part of the scheduling framework when a key scheduling-related event occurs, such as when a threads enters the system, terminates, becomes runnable/non-runnable, or when tick processing is due to update statistics. The framework also provides a set of callbacks to carry out periodic scheduling activations from interrupt (timer) and process (kernel thread) context on each core group separately, thus making it possible to invoke a wide range of blocking and non-blocking scheduling-related kernel API calls, such as those to map a thread to a specific CPU or core group. This modular approach to creating scheduling algorithms re- sembles the one used by scheduling classes (algorithms) inside the Linux kernel, but with a striking advantage: PMCSched scheduling plugins can be bundled in a kernel module that can be loaded on unmodified kernels.

Let's go ahead and prepare the basic events that our scheduling plugin should have logic to react to:

  1. Task becomes active.
  2. Task becomes inactive.
  3. Task exits the CPU.
  4. Periodic kthread calls the plugin.

There are also some events that our plugin could optionally track. An example of some are:

  1. Task forked.
  2. Task migrated.
  3. A defined profiling event takes place.

The minimum required functinos along with a policy ID, optional flags and string description. That would go into our new file:

Some of the fields of our plugin are conceived to ease development from user space. PMCSched exposes an entry of the proc/ filesystem that includes information on the available schedulers, their descriptions, and their ids. These configuration files can also be echoed to enable or disable verbose mode. Some places within PMCSched may print meaningful information when the verbose mode is active. In general, PMCSched uses trace_printk() to output insightful debugging information, but the plugins can and should check active_scheduler_verbose before printing information with printk.

Once we have defined and added our new plugin, we can now start developing functions for these cases:


2.2 Implementing the logic of the new plugin

We can now start implementing the logic for our example plugin. Let us, for example, implement something that resembles a Round Robin approach.

When the task becomes active, function on_active_thread_example() will be called. We can insert it into global linked lists prepared to keep track of active, stopped, or all threads. If the task that just became active was already in the global queue of active threads nothing has to be done. However, if the task that just became active was in the global stopped threads queue or if it was in no queue whatsoever, it will have to wait for its turn to be moved into the active threads, and so it should be stopped. Further, the plugin can NOT assume the task is not in a signal pending state (the singals states is a complex topic that would require a separate discussion).

When the task becomes inactive, function on_inactive_thread_example() will be called. The logic will resemble the activation case, but inverted.

When the periodic kthread is triggered, function sched_kthread_periodic_example() will be called. This is the function that actually implements the scheduling algorithm, using the global linked lists and making changes on which threads are allowed to run at a given time. First, we check if there is any stopped threads (since otherwise there is no point in stopping running threads):

At this point, we have to make the first (remember this is Round Robin) of the list of stopped threads start running (we can use send_signal). Hence, it is moved at the tail of the active threads. As at least one of its threads is running, its app is moved into the active apps list. The head of the active threads is moved into the tail of the stopped threads and stopped (signal). The app it belongs to is therefore moved into the stopped apps list if that was the last running thread. We find the first runnable stopped thread, if there is such a thread: if so, we can go ahead and perform the switch:

We check that the list of actives bigger than one as we don't want to kick the recently added thread:

Just so as to illustrate with an example, let us set the CPU mask. For instance, we will mask to core 1 if "schedulings" is a multiple of 5, or else core 0. We will check that the system has at least two logical cores as well: ... and voila! We have a very basic example scheduler plugin finished. We just have to remember to include it into the Makefile of our target architecture. For example, for an Intel system, within src/modules/pmcs/intel-core/Makefile:


3. Leveraging PMCs

Arguably, one of PMCSched's coolest features is its ability to collect information regarding Performance Monitoring Counters (PMCs), using the APIs provided by PMCTrack.

You can configure your plugin to collect certain metrics and events, such as instruction count, cycles, LLC misses and LLC references. This is particularly interesting to profile entering applications.

Let us illustrate how to collect a number of interesting PMCs:
  1. Instructions per cycle (normalized).
  2. LLC accesses per instruction (normalized).
  3. LLC misses per instruction (normalized).
  4. LLC misses per cycle (normalized).
Firstly, we need to prepare the descriptors for the various performance metrics: using a number of metrics and their indexes, that we have to define upfront: Finally, we can prepare the pmcsched_counter_config_t, which is the configuration exposed to PMCTrack. In the example below, we set the profiling mode to TBS_SCHED_MODE (as opposed to event based sampling, with EBS_SCHED_MODE). which we can pass as part of the plugin definition, using field counter_config. We also specify that for every new sample collected, we want PMCSched to call our plugin's function profile_thread_example(). The profiling function can then update global instruction counters from the sample: and use this information to decide, depending on the algorithm, how to classify the application.


4. Publications

Here's a list of the most recent publications related to PMCSched:
  1. Rapid development of OS support with PMCSched for scheduling on asymmetric multicore systems - 20th International Workshop on Algorithms, Models and Tools for Parallel Computing on Heterogeneous Platforms - HeteroPar'22 (at Euro-Par)


5. Contact

You can contact the two main project contributors:
  1. Juan Carlos Sáez Alcaide (jcsaezal at ucm.es)
  2. Carlos Bilbao (cbilbao at ucm.es)

PMCSched: Scheduling algorithms made easy - Online documentation and introduction

Authors: Juan Carlos Sáez Alcaide and Carlos Bilbao

Template Design & Develop by HarnishDesign.

Template - Copyright © 2020 iDocs. All Rights Reserved.